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Abstract: In many situations it is important to be able to propose N independent 
realizations of a given distribution law. We propose a strategy for making N parallel Monte 
Carlo Markov Chains (MCMC) interact in order to get an approximation of an independent 
A/-sample of a given target law. In this method each individual chain proposes candidates for 
all other chains. We prove that the set of interacting chains is itself a MCMC method for the 
product of N target measures. Compared to independent parallel chains this method is more 
time consuming, but we show through concrete examples that it possesses many advantages: 
it can speed up convergence toward the target law as well as handle the multi-modal case. 
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Methode de Monte Carlo par chaines de Markov 
en parallele et en interaction 

Resume : Dans de nombreuses situations il est important de pouvoir disposer de TV 
realisations independantes d'une loi donnee. Notre but est de developper une strategic 
d'interaction de TV methodes de Monte Carlo par Chaine de Markov (MCCM) dans le but 
de proposer une approximation d'un echantillon independant de taille TV d'une loi cible 
donnee. L'idee est que chaque chaine propose un candidat pour elle-meme mais egalement 
pour toutes les autres chaines. On montre que l'ensemble de ces TV chaines en interaction 
est lui-meme une methode MCCM pour le produit de TV mesures cibles. Cette approche est 
naturellement plus couteuse que TV chaines independantes, on montre toutefois au travers 
d'exemples concrets qu'elle possede plusieurs avantages : elle peut sensiblement accelerer la 
convergence vers la loi cible, elle permet egalement d'apprehender le cas multimodal. 

Mots-cle : methode de Monte Carlo par chaine de Markov, Metropolis-Hastings, chaines 
en interaction, approximation particulaire 
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1 Introduction 

Markov chain Monte Carlo (MCMC) algorithms [19, 12, 18] allows us to draw samples from 
a probability distribution n(x) dx known up to a multiplicative constant. They consist in 
sequentially simulating a single Markov chain whose limit distribution is n(x) dx. There exist 
many techniques to speed up the convergence toward the target distribution by improving 
the mixing properties of the chain [13]. Moreover, special attention should be given to the 
convergence diagnosis of this method [1, 6, 15]. 

An alternative is to run many Markov chains in parallel. The simplest multiple chain 
algorithm is to make use of parallel independent chains [9]. The recommendations concerning 
this idea seem contradictory in the literature (cf. the "many short runs" vs "one long run" 
debate described in [10]). We can note with [11] and [18, §6.5] that independent parallel 
chains could be a poor idea: among these chains some may not converge, so one long chain 
could be preferable to many short ones. Moreover, many parallel independent chains can 
artificially exhibit a more robust behavior which does not correspond to a real convergence 
of the algorithm. 

In practice one however make use of several chains in parallel. It is then tempting to 
exchange information between these chains to improve mixing properties of the MCMC 
samplers [4, 5, 16, 3, 7, 8]. A general framework of "Population Monte Carlo" has been 
proposed in this context [14, 17, 2]. In this paper we propose an interacting method between 
parallel chains which provides an independent sample from the target distribution. Contrary 
to papers previously cited, the proposal law n our work is given and does not adapt itself to 
the previous simulations. Hence, the problem of the choice of this law still remains. 

The Metropolis-Hastings (MH) algorithm and its theoretical properties are presented in 
section 2. The corresponding Metropolis within Gibbs (MwG) algorithm and its theoretical 
properties are presented in section 3. In Section 4, two simple numerical examples illustrate 
how the introduction of interactions can speed up the convergence and handle multi-modal 
cases. 

2 Parallel/interacting Metropolis Hastings (MH) algo- 
rithm 

Consider a target density law n(x) defined on (R",£>(R™)) and a proposal kernel density 
ir prop (y\x) . We propose a method for sampling N independent values X 1 , . . . , X N G R™ of 
the law ir(x) dx. 

Notations: Let 

X = X 1:N = X\- n G W IXN , 

so that Xg e M. N and X % <E R™ (the same for Y and Z); x G R n so that xj£l (the same for 
y and z); G R. Here X 1:N = (X 1 , . . . , X N ) and X 1:n = (X u . . . , X n ). We also define 
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->£ = {1, . . . ,n}\ {£}. Note that the structure of the matrix X is: 

X 1 

T 



x\ ■■■ x\ 



X{ 



N 



X = 



X) 



XI 



X, 



N 



x e 



xl. ■ ■ ■ XI, ■ ■ ■ X, 



N 



2.1 The algorithm 

We describe the Markov chain {X^}k>o over R nxJV corresponding the MH algorithm. It 
consists in TV mutually dependent realizations X 1 '^ (i = 1, . . . , N) of the state variable and 
its limit distribution will be 

Il(dX) = niX 1 ) dX 1 --- n(X N ) dX N . 

We detail an iteration X^ = X -> X( fe+1 ) = Z of the MH algorithm. The N vectors are 
updated sequentially: 

[x UN ] - [z l x 2 -- N ] -> [Z^X 3 ^] ■ ■ ■ [z 1:N - 1 x N ] - [Z 1:N ] . 

At sub-iteration "i", that is [Z^^X™] [Z 1:i X i+1:JV ], we simulate in two steps: 

Proposal step: independently one from the other, each chain j = 1 • • • N proposes a can- 
didate Y 3 £ W" 1 according to the proposal kernel starting from its current position, 
i.e. 

Y* ~„*j>( v \Z 1:i - 1 ,X i ,X i+1:N )dy. 
Note that the candidates Y J depend also on i. We will use a lighter notation: 

^(y\X i )=n^(y\Z 1 -- i -\X i ,X i + 1 -- N ). (1) 
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Selection step: We can chose among these N candidates Y 1:N or stay at X % according to 
the multinomial law: 

Y 1 with probability ± a^{X\ Y 1 ) , 

Y N with probability ± a i ' N (X i , Y N ) , 
X 1 with probability ~ P l (X\ Y) 
where the acceptance probabilities are 

1 N 

- p i {X \Y)^l--Y J ^{X\Yi). 



N ■ 



The final algorithm is depicted in Algorithm 1. 



choose leR"" 
for fc = 1, 2, . . . do 
for i = 1 : N do 
for j = 1 : TV do 

~w%*(y\X i )dy 
a? - [7r(^>[7(^|^)]/[7r(X07rf7(^|X')] A 1 
end for 

1 7 



Y 1 with probability a 1 /^ 



X 1 



end for 
end for 



Y N with probability a N /N 
X' 1 with probability p 



Algorithm 1: Parallel/ interacting MH algorithm. 



2.2 Description of the MH kernel 

Lemma 2.1 The Markov kernel associated with the MH procedure described in Section 2.1 
is 

P(X; dZ) = P\X 1:N ; dZ 1 ) P 2 {Z 1 , X 2:N '■ dZ 2 ) ■ ■ ■ P N (Z 1:N ~ 1 , X N ; dZ N ) (2) 
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where 

N 



P\Z^-\X l -- N -Az) d ^l-J2a^(X\z) K7(AX l ) dz + p'iX*) 5 xi (dz) . (3) 
Acceptation probability is 



N ■. 



(x z) = { ^ (X ' Z) A 1 if ix > Z) G ^ ' (4) 
[0 otherwise, 

r ,:, (a;, z) = ^4 — , , x , (5) 



iV ,=1 •>« 



(6) 



TTie sei R l,J is defined by: 

Ri'i = {(x, z) G K" x R" ; n(z) tt[;/ p (x|z) > and ir(x) ^{z^) > 0} . 
Note that the functions a l ' j (x,z), p l {x), r l ^(x,z) and the set R 1 ^ depend on and 

x i:N . 

The measures 

v(dx x dz) = ir(z)irl'° p (x\z)dzdx, v T {dx x dz) = n(x) ir P '° p (z\x) dz dx 

are mutually absolutely continuous over R 1 ^ and mutually singular on the complementary 
set [R l ^] c . The set R % ^ is unique, up to the v and v T negligible sets, and symmetric, i.e. 
(x,z) e R iJ => (z,x) e R iJ . 



Proof This construction follows the general setup proposed by Luke Tierney in [20]. We 
now derive the probability kernel associated with the iteration described in the previous 
subsection 2.1. The kernel P l (Z 1:l ~ 1 , X l '- N ; dz) is the composition of a proposition kernel 
and of a selection kernel: 

P\Z 1:t -\X v - N -dz) = ( S l (Z 1:l -\X l -- N ,Y 1:N ;dz) Q^Z 14 ' 1 , X t:N ■ dY VN ) 
Jy in 

which consists in proposing independently N candidates Y 1:N sampled from the density 
proposition, i.e. 

N 

Q'iZ^-^X^-dY 1 ^) = \\^J{Y k \X % )dY k 

fe=i 
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then to select among these candidates or to stay at X 1 with the MH acceptance probability, 
i.e. 



1 N 

S' l (Z v - l -\X v - N , Y 1:N - dz) = — J2 <j' j {X\ Yi) 5 Yi (dz) + ~p\X\ Y) 5 xi {dz) . 



Hence: 

P l {Z 1:l -\X v - N ;dz) = 

N N 



J YiN a i ' j (X i ,Y^6 Yj (dz){f[^(Y k \X i )dY k } 

j ' — 1 k — 1 

r N 

+ p i (X i ,Y)5 xi (dz){H^"(Y k \X i )dY k ]=A 1 +A 2 



and 



/ \X[K7(Y k \X*)dY k X dY> 



= ^Y t ^{X\z)^{z\X i ) dz 
because J Yj S Y j(dz) dY^ = dz. The second term A 2 reads: 

N 



A 2 = [ p\X\Y)5 x ,(dz) { f\^{y k \X i )AY k ) 

Jyl-.S k=l ' J 

r 1 N N 

= S xi (dz) J^ N [l--Y j a^{X\Y^)} {n<r(^l^)d^ fe } 

j ' — 1 k.—l 
N ,. N 

= s xi (dz) {i-^E / « <J '(^,^') n<r(^i^)dr fc } 

j=l J Y 1N k =l 

= s xi (dz){i--Y^ / ^(^,^)<7(^|^)dy^}. 

i=i Jyj 

Summing up A\ and A 2 proves the Lemma. □ 
RR n° 0123456789 
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2.3 Invariance property 

Lemma 2.2 For all (x,z) 6l"x W 1 a.e. we have: 

z) tt(x) ttI'° p (z\x) = a iJ (z, x) n(z) < r / p (a;|z) . 

Proof For (a;, z) £ the result is obvious. For (x, z) G we have: 

(r ij (x,z)Al)7r(x) ir\J>{z\x) 

= min{7r(z)7r[J p (x|z) , ir{x) ir^ {z\x)} 
= (r* J (z, x) A 1) tt(z) tt p ;; p (x|z) . 

□ 

Lemma 2.3 (conditional detailed balance) The following equality of measures defined 
on R n x MP 

P^Z 1 ' 4 ' 1 , X i:N ; dZ l ) ttCJC*) dX i = P\Z 1: \ X l+1:N '■ dX r ) k{Z 1 ) dZ l (7) 
holds true for any i = 1, . . . , N, Z 1 '^ 1 e R( i_1 ) xJV ; and X l+1:N e #- l)xAr . 

Proof Left hand side of (7) is a measure, say v(dZ l x dJP) on (M™ x K",B(K" x M™)). 
For all Ai, A 2 € B(R"), we want to prove that x A 2 ) = v(A 2 x Ai). We have: 

v(A 1 xA 2 ) = J P i {Z VA - 1 ,X kN ;A 1 )\ A AX i )^{X i )dX i 

and 

iV 

P^Z 1 ^- 1 ,^^;^!) = j^J2 ^A 1 {Z l )a l ^(X\Z l ) n\J(Z l \X l ) dZ l 

+ p i (X i )l Al (X i ) 

so that 

x A 2 ) 
1 n 

= Tr/Z // U 1 {Z i )l Aa (X i )a i '*(X i ,Z i )w(X i )w%*(Z i \X i )dX i dZ i 

3=1 J J 

+ J p^X l )\ Al (X*)l A2 (X l )K{X l )dX\ (8) 
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And from Lemma 2.2, we get: 
v{A x x A 2 ) 



1 N 



N ■ , 



Exchanging the name of variables X 1 <-» Z J in the first term of the right hand side of the 
previous equality, leads to the same expression as (8) where A\ and A2 were interchanged, 
in other words v(Ai x A 2 ) = v{A 2 x A\). □ 

Proposition 2.4 (invariance) The probability measure 

Il(dX) = ^(X 1 ) dX 1 • • • n(X N ) dX N 
is an invariant distribution of the Markov kernel P, i. e. HP = H that is: 

N N 



^ x »=1 i=l 



Proof 

JV 



f P(X,dZ) {f[n(X*)dX 1 } 

^ x i=i 

= / P 1 (X 1:N ;dZ 1 )P 2 (Z 1 ,X 2:N ;dZ 2 )--- 
Jx N 

■ ■ ■ P N (Z 1:N - 1 ,X N ; dZ N ) { J] n(X*) dX 1 } 

i=l 

= f P^X^-^Z^^X^dX 1 P 2 (Z\X 2 -- N ;dZ 2 )--- 
Jx N 

■ ■ ■ P n (Z 1:N -\X N ; dZ N ) { Y[ ir{X l ) dX'} . 

i=2 

Using (7) with i = 1 gives: 

r N 

= / P 1 (Z\X 2 - N ;dX 1 )TT(Z 1 )dZ 1 P 2 (Z\X 2 - N -dZ 2 )--- 
Jx N 

■ ■ ■ P n (Z 1:N -\X N ; dZ N ) { Y[ A xi ) d*'} • 



(9) 
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In this last expression the kernel P 1 (Z 1 , X 2:N ; dX 1 ) is a measure on the variable X 1 which 
no longer appears in the integrand. Therefore its integral with respect to this variable is 1, 
hence: 



L 



P(X,dZ) []Jn(X')dX^ = 

i=l 

= n(Z 1 )dZ 1 f P 2 (Z\X 2 - N -dZ 2 )--- 

■ ■ ■ P n (Z l -- N -\X N : dZ N ) { J] d^} • 



i=2 

'2 + „ Y N 



Repeating this procedure successively for X to X leads to (9). □ 

3 Parallel/interacting Metropolis within Gibbs 
(MwG) algorithm 

Let ir(x) be the probability density function of a target distribution defined on (R™, B(R n )). 
For £ = 1, . . . , n, we define the conditional laws: 

Tre(x e \x^) = -j— — . (10) 

When we know to sample from (10), we are able to use the Gibbs sampler. It is possible 
to adapt our interacting method to parallel Gibbs sampler. But very often we do not know 
how to sample from (10) and therefore we consider proposal conditional densities TTg' op (xe) 
defined for all i. In this case, we use Metropolis within Gibbs algorithm (see appendix). 
We present in the following how to make interactions between parallel MwG algorthims. 
The MwG algorithm is more general than Gibbs algorithm, so a parallel/interacted Gibbs 
algorithm can easily be deduced from the parallel/interacted MwG algorithm. 

3.1 The algorithm 

One iteration X — > Z of the parallel/interacting Metropolis within Gibbs method consists 
in updating the components Xi successively for t — 1, . . . , n, i.e. 

[-X"l:n] ~~ * [Z\X2: n \ — ► [-^l:2^3:n] ' ' ' [Zl;n-lX n ] — > [Zi- n ] . 

For each I fixed, the subcomponents X\ are updated sequentially for i = 1, . . . , N in two 
steps: 

(i) Proposal step: We sample independently N candidates Yg e M. for j = 1 : N according 
to: 

Yl ~ 4-° p {i\ \Z, X\ ,X}\)6$, l<j<n 
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where 











zir 1 




Zi-.e-i 


i 










v 'n 

I 





We also use the following lighter notation: 

(ii) Selection step: The subcomponent X\ could be replaced by one of the N candidates 
Yg' N or stay unchanged according to a multinomial sampling, the resulting value is 
called Z\, i.e.: 

1 



where: 



' Y} with probability ± a % f(X}, F/) , 



Y e N with probability ± a\ ,N (X\ , Y e N ) , 
[ X] with probability p^X*, F/ :JV ) 



i=i 

The resulting algorithm is depicted in Algorithm 2. 
3.2 Description of the MH kernel 

Lemma 3.1 The Markov kernel on M. nxN associated with the MH algorithm described in 
Section 3.1, is 



P(X, dZ) = P 1 (X 1:n ; dZ x ) P 2 (Z 1 ,X 2:n ;dZ 2 ) ■ ■ ■ P n (Z 1:n _ u X n ;dZ n ) . 



(11) 



At iteration t, the kernel P({Z\.t-\, X^ n \ dZg) generates Zj :N from the already updated com- 
ponents Z\:f_ x and the remaining components X}':^ . 

Each component Z\. t , for i = 1 ■ • • N, is updated independently one from each other: 



N 



P t {Z 1 .. l _ u X l .. n - dZ t ) ^ H Pi(fZ, XI X}}; dZ\) . 



(12) 
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choose X eR nxN 
for k = 1,2, . .. do 
for £ — 1 : n do 
for z = 1 : iV do 
for 9 1 = 1 : AT do 

A 1 



'' ' ^(XIIX*,) ^;p'°p(F/|X|) 
end for 

Yf with probability a 1 /N 

Y e N with probability a N /N 
X\ with probability p 



end for 
end for 
end for 



Algorithm 2: Parallel /inter acting MwG. 



Here Z\ is generated from fZ, X%, XJg according to: 

p;(Pu,A^;do d = f ^f>; j '&o <r p (£'io d^'+ P j(o W) (is) 

Acceptation probabilities are: 

af( £ ,e)W r '" ( "' ,A1 ,/( «' )eR ' i ' (i4, 

I otherwise, 

JV 

7V j=l "'K 
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Finally, R^ J is the set of ordered pairs (£,£') e K 2 such that 

nt(Z\Zi :t _ 1 ,X} +1 .j4j'°''{?\t)>0. 

Note that the functions af (£,£')> Pe(£)> r ^ J (£j£') aTl ^ ^ e se t P^i depend on Z\,n_\ and 
Xg+i; n . 

Proof This construction follows the general setup proposed by Luke Tierney in [20]. The 
kernel is defined by: 

i?([Z,£,X]j;dO = / Si(lZ^,Xli,C 1:N -,d^)xQl(lZ,^Xii-,d<: 1:N ) . 

JR NS v ' V v ' 

selection kernel proposal kernel 

This kernel consists firstly in proposing a population of N candidates C 1:JV S sampled 
from: 

N 

Q\(iz, t, xf 6 dC 1:Ar ) = J] <T P (C j 10 d C J , (17) 

3=1 

then secondly in selecting among these candidates or rejecting them according to a MH 
technique, i.e. 

1 N 

s\{\z, e, xj\, c l:JV ; ao d = f - c j ) h (<tf) + m, c 1:Ar ) W) (is) 

3=1 

where is given by (14) and C 1:Ar ) = 1 - ^ Ejli "^(C, C J )- 
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Hence: 



N N 

j=i J C 1N fe=i 

1 N r N 

+ { 1 -m11 / 4 ,j (^c j ) n<r op (c fc iOdc fe }<5«(dO 

J= i J C 1JV fe= i 

JV 

= ^E / <(^c j ')^(dO^r p (c j i0dc j 

+ 1 1 - 4E / «^>cVr p (c J 'io dc j } W) 

j=i J C 3 

= ^E a ^>r op (£'io 

1 w r 

+ i 1 ^E dr} W) 

which correspond to Equations (13) to (16). □ 

3.3 Invariance property 
Lemma 3.2 For almost all (£,£') € M 2 : 

/or an?/ £, i, j, {Z^^, X} +1 . n ) , and {Z{. t _^X\ +v J. 

Proof For (£,£') £ R% j , the result is obvious. For (£,£') G a.e.: 
(rf (£,£') A 1) ^^1^^!, Xj +1:n ) 7rr P K'IO 

= mm{^(c'i^ : .-i,^ + i : jTr p (eie') , ^(ci^^-i,^ + i : j^ prop (^o} 

= (rf (£',£) A 1) 7r,(e'l^ : .-i,^ + i : J 7rr P (£lO • 

□ 
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Lemma 3.3 (conditional detailed balance) The following equality of measures defined 
onRxR 

PHIZ, Z, Xf 6 AO x Tr^l^.i, X} +1:n ) d£ 

= Pi(lz,t',X\\;dt) x nWziu^xi+^dt' (19) 

holds true for any I = 1 • • -n, i = 1 • • • N and Z w _i e M JVx < £ - 1 ), X e+1:n e R Nx ( n ~ e \ 

Proof The left hand side of equality (19) is a measure v(d? x d£) defined on (M 2 , B(R 2 )). 
For all Ai, A 2 e B(R), we want to prove that u(Ai x A 2 ) = v{A 2 x Ax). 
We have: 

"(Ai x A 2 ) = J Piaz^^X^A^lA^Onemi.^X^Jdt 

and 

^([z.^x]';^) = wor^o ^rtfi® de+ P m wo 



3 = 1 

so that 

N 



v(a 1 xa 2 ) = ±£ //wowo^-o 

*t K I 3:/-! > *i+l:n) ^ 10 ^ <tf 

+ /pkowowo^i^-i, xi+i-jdn (20) 

Using Lemma 3.2 we get: 



N 

v(A 1 xA 2 ) = ±<r, //wo wo <wo 

.7 = 1 ^ ^ 



^(Cl^-l.^j+l: J ^ r ° P (elO dC d£ 
+ J pKO WO WO ^(£1^-1,^+1: J d£ 

Exchanging the name of variables £ <-> £' in the first term of the right hand side of the 
previous equality leads to the same expression as (20) where A\ and A 2 were interchanged, 
in other words v{A\ x A 2 ) = v{A 2 x Ax). □ 

Proposition 3.4 (invariance) The measure 

Il(dX) = v(X x ) dX 1 ■ ■ ■ ir(X N ) dX N 

is invariant for the kernel P, that is HP = H i.e.: 

,. N N 

/ P(X,dZ) [Y[ir(X l ) dX 1 } = Y[ir{Z l )dZ l . (21) 

•* x i=l i=l 
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Proof 

N 



f P(X,dZ) {f[ir(X l )dX 1 } 

J x i=l 

= / Pi(Xi :n ; dZ\) P 2 {Z\, X 2:n ; dZ 2 ) ■ ■ ■ P n (Zi :n _i, X n ; dZ n ) 

JX N 

Hi^iXllXUdXl ^{Xl n )dXl n } 

i=\ 

r N 
= / p^x^dz^ {U^ixiixwixi} 

x *=1 N 

P 2 (Z 1 ,X 2 .. n ; dZ 2 ) ■ ■ ■ P n (Z 1:n -i,X n ; dZ n ) { [] 7r^(X/j : J d*j :n } 

i=l 

r N N 

= / {i[piaz,xixi\;dzi)} {HMxuxudxi} 

JX i=l i=l TV 

P 2 (^i, X 2: „; dZ 2 ) • • • P n (Zi :n _!, X„; dZ n ) { JJ 7r^i(X* :n ) d^ :n } 

Moreover 

JV 

P 1 (X 1:n ;dZ 1 ) {H^iXilXijdXi} = 

i=i 

N N 

= { n pi ( \z,x[ ,xj\ ■ dz\ ) } { n ti (x« 1 j dxj } 

i=i i=i 

TV 

= [] P*([Z, , Xj\ ; dZj) 7T! (Xj \Xi J dXJ 

TV 

= n piaz, z\ , xn- dx\ ) tti (zj |x* :n ) 
i=i 

this last equality follows from Equation (19). Hence, 

r N 

/ P(X,dZ) {Y[w(X i )dX i } 

■'x i=l 

r N 

= J \{{Pl{lZ,Z\,Xt ] dXl)T: 1 {Z\\Xl.JdZ\)p 2 {Z l ,X 2 .. n] dZ2)--- 

X i=1 N 

■ ■ ■ P n (Z 1:n ^,X n ; dZ n ) { [] ^i(^ :n ) dX« :n } 
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In this last expression, for i = 1, . . . , N, the kernel P{(\Z, Z\, X}\; dX{) is a measure for the 
variable X\ which no longer appears in the integrand. Using the fact that the integral of 
the kernel w.r.t. X\ is 1 we get: 



f P(X,dZ) {flniX^dX*} 

^ x »=1 
N 

J] {^{ZilXijdZl} P 2 (Z 1 ,X 2:n ;dZ 2 ) ■ ■ ■ 

2-N i=1 N 

■ ■ ■ P n {Z v . n _ u X n - dZ n ) { I] ^i(XL) d^2:„} 

1=1 

N 

= / \{P2{Z 1 ,X 2 .. r -dZ 2 )--- 

JX 2 .N i=1 ^ 

• • • P„(Zi :n _ 1) X n ; dZ„) { f[ 7r(^X* :n ) dZj dX l 2 . n } 
Repeating this process successively for X 2 to X n leads to (21). □ 

4 Numerical tests 

4.1 A multi-modal example 

We apply now the parallel/interacting Metropolis-Hastings sampler, see Section 2, to a case 
where the target distribution is multimodal: 

7T = pi JV(Ci , /) + vi M{C 2 ,1)+P3 Af(C 3 , 1) 

with pi = 0.1, p 2 = 0.3, p 3 = 0.6, and Ci = (-10, -10), C 2 = (5, 0), C 3 = (-5, 5). It is a 
mixture of 3 two-dimensional Gaussian densities. 

We describe the proposal kernel (1), for updating the component X 1 , each chain j propose 
a new candidate according to the following distribution law: 



M{Xi,I), ]£i = j 



where d= \X l - X'\. 

The idea here is to explore the space with a Gaussian random walk (i — j) but also to 
allow "jumps" toward already explored interesting areas (i ^ j). If X 1 and X 7 are close one 
the other, then "the chain j will propose a candidate far from X j and X™. If X % and X' 
are far one to the other, then the "chain j will propose a candidate close to X^\ 
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Figure 1: Target distribution n(x) (left) and initial positions of the chains X' '' 4 , for i — 
1---N (right). 
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Figure 2: Positions of the chains X^' 2 , for i — 1 • • • N, at iterations k — 1000 (left) and 
k = 5000 (right). 
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Here N = 50, and the initial points X^ ^*, for i = 1 • • • N, are sampled according to the 
uniform law on the square [—15, 10] x [0, 10], see Figure 1 (right). Figures 2 clearly demon- 
strate the convergence of the method. In Figure 3 we present the evolution of the proportion 
of particles located in the neighbor of the three different modes: this also demonstrates the 
good behavior of the method. Note that the initial particles do not cover the mode number 
2, so the algorithm is able to reach the isolated mode and to balances the particles among 
the modes according to the parameters Pj. 



4.2 An hidden Markov model 

We apply the parallel/interacting Metropolis within Gibbs sampler, see Section 3, to a toy 
problem where a good estimate if of the target distribution ir is available. Consider the 
linear Gaussian state space model: 

S£+i=as£+W£, si ~ W(si, Qi) , (22a) 

y£ = bs t +v f (22b) 

for t = 1 • • -n, where Wi :n and vi :n are centered white Gaussian noises with variances ct 2 
and a 2 . Suppose that b is known and a = 9 is unknown with a priori law Af(/j,g,a%). We 
also suppose that wi :n , vi :n , si and 9 are mutually independent. 

The state variable is 

Xl:n+1 = (Sl:n, 0) 

and the target conditional density is 

7r(a;i: n+ i) da;i :n+ i = 7r(si :n ,i?) dsi :n M = law(si :n , 6\yi :n = yi :n ) ■ 

This target law is not Gaussian, but we can perform a Gibbs sampler. Indeed the marginal 
conditional laws are available: 

7Ts f (si\s^£, d) ds e = law(s^|s^ 
7re(^|si:n)di? = law(0|si :n 

with 

f 2 = (^ + ^^)- 1 , 
We will perform three algorithms: 

(i) N parallel/interacting Metropolis within Gibbs samplers (Alg. 2), 

(ii) N parallel/independent Metropolis within Gibbs samplers (Alg. 3), 



= s^ e , = #, y 1:n = y 1:n ) = AT{m e , r 2 ) , 

= Sl :n , yi:„ = Vl-.n) = AT(m, f 2 ) 

rn^r 2 (^ + ^1 + ^1), 
m = f 2 £? = a *~ lfl - 
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CPU time (sec.) 



Figure 4: Evolution of the indicator e k , see (23), for the parallel/independent MwG sampler 
(- -), and for the parallel/interacting MH sampler (-). This evolution is depicted as a 
function of the CPU time and not as a function of the iteration number k. The residual 
error of about 0.22 for the second method is due to the limited size of the sample. 
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Figure 5: Evolution of the indicator e k , see (23), for the parallel/independent MwG sampler 
(- -). After 5000 sec. CPU time, the convergence of this method is still unsatisfactory. 
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(Hi) N Gibbs parallel/independent Gibbs samplers. 

Our aim is to show that making parallel samplers interact could speed up the convergence 
toward the stationary distribution. 

Because of its good convergence property, method (Hi) is considered as a reference 
method. Here we perform k = 10000 iterations of N Gibbs = 5000 independent Gibbs sam- 
plers. We obtain a kernel density estimate n of the target density based on the N Gibbs = 5000 
final values. Let n xe be the corresponding £-th marginal density. 

For methods (i) and (ii) we perform N ~ 50 parallel samplers. Let 7r int ' fc and 7r ind ' fc be 
the kernel density estimates of the target density based on the final values of methods (i) 
and (ii) respectively. Let 7rj."'' fc and 7rj," d ' fc be the corresponding £-th marginal densities. 

The parameter values for the simulations are a = 2, b = 2, cr^ = 9, = 25, si ~ 7V(4, 9), 
9 ~ M(l,A) and n = 10. 

For each algorithm (i) and (ii), that is for w k e = 7rj ( " d,/c and 7rj."'' fc , we compute 



Hence e\ is an estimation of the L 1 error between the target probability distribution and 
its estimation provided by the algorithm used. To sum up the information of the n = 10 
indicators we consider their mean: 



These estimations are based on a sample of size N = 50 only, so they suffer from variability. 
This is not problematical, indeed we do not want to estimate L 1 errors but to diagnose the 
convergence toward the stationary distribution. So we use e k t as an indicator which must 
decrease and remain close to a small value when convergence occurs. 

To compare fairly the parallel/independent MwG algorithm and the parallel/interacted 
MwG algorithm, we represent on Figures 4 and 5 the indicator e k for each algorithm not as 
a function of k but as a function of the CPU time. 

In Figure 4 we see that even if one iteration of algorithm (i) needs more CPU than one 
of (ii), still the first algorithm converges more rapidly than the second one. The residual 
error of 0.22 is due to the limited size of the sample. This error decreases to as N j oo. 
Figure 5 shows the inefficiency of parallel/independent MwG on this simple model. 

5 Conclusion 

This work showed that making parallel MCMC chains interact could improve their conver- 
gence properties. We proved the basic properties of the MCMC method, we did not prove 
that the proposed strategy speeds up the convergence. This difficult point is related to the 
problem of the rate of the convergence of the MCMC algorithms. 
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(23) 



24 



F. Campillo & V. Rossi 



Through a simple example we saw that the Metropolis within Gibbs strategy could be 
a poor strategy. However this method is widely used in practice on more complex non 
linear models. In this situation our strategy improved the convergence properties. We also 
demonstrated that this approach can handle multimodal cases. 
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Appendix: MwG algorithm 

One iteration X —> Z of the Metropolis within Gibbs method consists in updating the 
components Xe successively for £ = 1, . . . , n, i.e. 

[Xl-.n] — > [ZiX 2 :n] — * [-^l:2^3:n] ' 1 ' [^l:n-l-X"n] — ► [^l:n] • 

Each components Xe is updated in two steps: 

(i) Proposal step: We sample a candidate Yg according to: 

Yz~irY° p (0 d£ 

(ii) Selection step: The component Xi could be replaced by the candidate Ye or stay 
unchanged according to a binomial sampling, the resulting value is called Ze, i.e.: 

{Ye with probability ae(Xe,Ye) , 
X e with probability 1 — ae(Xe,Ye) 



where: 



"A?.? ) = — 777 prop/ C A A 1 
(£) 



The resulting algorithm is depicted in Algorithm 3. 



choose X 1:n e K" 
for fe = 1,2,... do 
for I = 1 : n do 

~ 7r£ rop (£) d£ {proposed candidate} 
w~W[0,l] 

if u < ae{Xe 1 Ye) then 

Xe^Ye 
end if 
end for 
end for 



Algorithm 3: Metropolis within Gibbs sampler. We can go through the component indices 
in a random way. 
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